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Abstract 

The frequentist method of simulated minimum distance (SMD) is widely used in economics 
to estimate complex models with an intractable likelihood. In other disciplines, a Bayesian 
approach known as Approximate Bayesian Computation (ABC) is far more popular. This 
paper connects these two seemingly related approaches to likelihood-free estimation by means of 
a Reverse Sampler that uses both optimization and importance weighting to target the posterior 
distribution. Its hybrid features enable us to analyze an ABC estimate from the perspective of 
SMD. We show that an ideal ABC estimate can be obtained as a weighted average of a sequence 
of SMD modes, each being the minimizer of the deviations between the data and the model. 
This contrasts with the SMD, which is the mode of the average deviations. Using stochastic 
expansions, we provide a general characterization of frequentist estimators and those based on 
Bayesian computations including Laplace-type estimators. Their differences are illustrated using 
analytical examples and a simulation study of the dynamic panel model. 
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1 Introduction 


As knowledge accumulates, scientists and social scientists incorporate more and more features into 
their models to have a better representation of the data. The increased model complexity comes at 
a cost; the conventional approach of estimating a model by writing down its likelihood function is 
often not possible. Different disciplines have developed different ways of handling models with an 
intractable likelihood. An approach popular amongst evolutionary biologists, geneticists, ecologists, 
psychologists and statisticians is Approximate Bayesian Computation (ABC). This work is largely 
unknown to economists who mostly estimate complex models using frequentist methods that we 
generically refer to as the method of Simulated Minimum Distance (SMD), and which include such 
estimators as Simulated Method of Moments, Indirect Inference, or Efficient Methods of Moments]]] 

The ABC and SMD share the same goal of estimating parameters 9 using auxiliary statistics 
-0 that are informative about the data. An SMD estimator minimizes the L 2 distance between 
-0 and an average of the auxiliary statistics simulated under 9, and this distance can be made as 
close to zero as machine precision permits. An ABC estimator evaluates the distance between ?/> 
and the auxiliary statistics simulated for each 9 drawn from a proposal distribution. The posterior 
mean is then a weighted average of the draws that satisfy a distance threshold of 5 > 0. There are 
many ABC algorithms, each differing according to the choice of the distance metric, the weights, 
and sampling scheme. But the algorithms can only approximate the desired posterior distribution 
because 5 cannot be zero, or even too close to zero, in practice. 

While both SMD and ABC use simulations to match ip(9) to ^ (hence likelihood-free), the rela¬ 
tion between them is not well understood beyond the fact that they are asymptotically equivalent 
conditions. To make progress, we focus on the MCMC-ABC algorithm due to 
). The algorithm applies uniform weights to those 9 satisfying ||?/> — ^(0)|| < 5 
and zero otherwise. Our main insight is that this 5 can be made very close to zero if we combine 
optimization with Bayesian computations. In particular, the desired ABC posterior distribution 
can be targeted using a ‘Reverse Sampler’ (or RS for short) that applies importance weights to a 
sequence of SMD solutions. Hence, seen from the perspective of the RS, the ideal MCMC-ABC 
estimate with 6 = 0 is a weighted average of SMD modes. This offers a useful contrast with the 
SMD estimate, which is the mode of the average deviations between the model and the data. We 
then use stochastic expansions to study sources of variations in the two estimators in the case 
of exact identification. The differences are illustrated using simple analytical examples as well as 
simulations of the dynamic panel model. 
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Optimization of models with a non-smooth objective function is challenging, even when the 


model is not complex. The Quasi-Bayes (LT) approach due to Chernozhukov and Hong (2003) 
uses Bayesian computations to approximate the mode of a likelihood-free objective function. Its 
validity rests on the Laplace (asymptotic normal) approximation of the posterior distribution with 
the goal of valid asymptotic frequentist inference. The simulation analog of the LT (which we call 
SLT) further uses simulations to approximate the intractable relation between the model and the 
data. We show that both the LT and SLT can also be represented as a weighted average of modes 
with appropriately defined importance weights. 

A central theme of our analysis is that the mean computed from many likelihood-free poste¬ 
rior distributions can be seen as a weighted average of solutions to frequentist objective functions. 
Optimization permits us to turn the focus from computational to analytical aspects of the poste¬ 
rior mean, and thus provides a bridge between the seemingly related approaches. Although our 
optimization-based samplers are not intended to compete with the many ABC algorithms that are 
available, they can be useful in situations when numerical optimization of the auxiliary model is 


fast. This aspect is studied in our companion paper Forneron and Ng (2015) in which implemen¬ 


tation of the RS in the overidentified case is also considered. The RS is independently proposed in 


Meeds and Welling (2015) with suggestions for efficient and parallel implementations. Our focus 


on the analytical properties complements their analysis. 

The paper proceeds as follows. After laying out the preliminaries in Section 2, Section 3 presents 
the general idea behind ABC and introduces an optimization view of the ideal MCMC-ABC. Section 

4 considers Quasi-Bayes estimators and interprets them from an optimization perspective. Section 

5 uses stochastic expansions to study the properties of the estimators. Section 6 uses analytical 
examples and simulations to illustrate their differences. Throughout, we focus the discussion on 
features that distinguish the SMD from ABC which are lesser known to economists^] 

2 Preliminaries 


As a matter of notation, we use L(-) to denote the likelihood, p(-) to denote posterior densities, q(-) 
for proposal densities, and 7r(-) to denote prior densities. A ‘hat’ denotes estimators that correspond 
to the mode and a ‘bar’ is used for estimators that correspond to the posterior mean. We use ( s, S) 
and (6, B) to denote the (specific, total number of) draws in frequentist and Bayesian type analyses 


2 The class of SMD estimators considered are well known in the macro and finance literature and with apologies, 
many references are omitted. We also do not consider discrete choice models; though the idea is conceptually similar, 


the implementation requires different analytical tools. Smith (20081 provides a concise overview of these methods. 


The finite sample properties of the estimators are studied in Michaelides and Ng (2000). Readers are referred to the 
original paper concerning the assumptions used. 
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respectively. A superscript s denotes a specific draw and a subscript S denotes the average over 
S draws. For a function f(0), we use fg(9 o) to denote -§gf{9) evaluated at 9q. foo :i (9o) to denote 

r\ a 2 

■j%ffe(6) evaluated at $o and fo.o :r o k (9 q) to denote fe{9) evaluated at 9q. 

Throughout, we assume that the data y = (yi,..., yrY are covariance stationary and can 
be represented by a parametric model with probability measure Vq where 0 £ © C R A . There 
is a unique model parameterized by 6 q. Unless otherwise stated, we write E[-] for expectations 
taken under Pg 0 instead of E-p flQ [•]. If the likelihood L(9) = L(9 |y) is tractable, maximizing the 
log-likelihood 1(9) = log L(9) with respect to 9 gives 

9 ml = argmax/(0). 


Bayesian estimation combines the likelihood with a prior tt(9) to yield the posterior density 

m-<o) 


p(0 |y) = 


LL(9U(9)d9- 


(1) 


For any prior ir(9), it is known that 9ml solves &rgmaxg£(9) = limy^oo j c(e))n(d)d0 • That 
is, the maximum likelihood estimator is a limit of the Bayes estimator using A —> oo replications 
of the data yR The parameter A is the cooling temperature in simulated annealing, a stochastic 


optimizer due to Kirkpatrick et al. (1983) for handling problems with multiple modes. 

In the case of conjugate problems, the posterior distribution has a parametric form which makes 
it easy to compute the posterior mean and other quantities of interest. For non-conjugate problems, 
the method of Monte-Carlo Markov Chain (MCMC) allows sampling from a Markov Chain whose 
ergodic distribution is the target posterior distribution p(9 |y), and without the need to compute the 
normalizing constant. We use the Metropolis-Hastings (MH) algorithm in subsequent discussion. 
In classical Bayesian estimation with proposal density q(-), the acceptance ratio is 

' L(9 b+1 )Tr(9 b+1 )q(9 b \9 b+1 ) 


PBc(0 b ,9 b+1 ) = 


mm 


1 


L(9 b )n(9 b )q(9 b + 1 \9 b ) 

When the posterior mode 9bc = argmax e p(0|y) is difficult to obtain, the posterior mean 


l 


B f 

9p(9\y)de 

b= i J& 


is often the reported estimate, where 9 b are draws from the Markov Chain upon convergence. Under 
quadratic loss, the posterior mean minimizes the posterior risk Q(a) = f Q \9 — a\ 2 p(9\y)d0. 


3 See 


Robert and Casella 


12004 Corollary 5.11), Jacquier et al. (2007). 
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2.1 Minimum Distance Estimators 


The method of generalized method of moments (GMM) is a likelihood-free frequentist estimator 


developed in Hansen (1982); Hansen and Singleton (1982). It allows, for example, the estimation 


of K parameters in a dynamic model without explicitly solving the full model. It is based on a 
vector of L > K moment conditions gt(9) whose expected value is zero at 9 = do, ie. K[g t (9o)] = 0. 
Let g{9) = ^ Ylt=i 9t(@) be the sample analog of K[g t (9)]. The estimator is 


Ogmm = argrnin e J{9), 


j(o) = --wywm 


(2) 


where W is a L x L positive-definite weighting matrix. Most estimators can be put in the GMM 
framework with suitable choice of g For example, when gt is the score of the likelihood, the 
maximum likelihood estimator is obtained. 

Let ip = ip{y{0o)) be L auxiliary statistics with the property that \fT(tp — ip(9o))—>N'(0,'E). 
It is assume that the mapping ip{9) = lim'r_ ) . 0O E[^(0)] is continuously differentiable in 9 and 
locally injective at Qq. Gourieroux et al. (1993) refers to ip (9) as the binding function while Jiang 


and Turnbull (2004) uses the term bridge function. The minimum distance estimator is a GMM 


estimator which specifies 

g{9) = if(9), 

with efficient weighting matrix W = S” 1 . Classical MD estimation assumes that the binding 
function if (9) has a closed form expression so that in the exactly identified case, one can solve for 
9 by inverting g(6). 

2.2 SMD Estimators 

Simulation estimation is useful when the asymptotic binding function tp(9o) is not analytically 
tractable but can be easily evaluated on simulated data. The first use of this approach in economics 


appears to be due to Smith (1993). The simulated analog of MD, which we will call SMD, minimizes 


the weighted difference between the auxiliary statistics evaluated at the observed and simulated 
data: 


where 


®SMD - argmin^Js($) - aTgmm e g' s (S)Wg s (S). 




S=1 


y s = y s (e s ,9) are data simulated under 9 with errors e drawn from an assumed distribution F e , 
and ip s {9) = ip s (y s (e s ,9)) are the auxiliary statistics computed using y s . Of course, gg(6) is also 
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the average of over S deviations between and ip s (y s (e s ,9)). To simplify notation, we will write 
y s and ip s (9) when the context is clear. As in MD estimation, the auxiliary statistics ip(9) should 
‘smoothly embed’ the properties of the data in the terminology of jGallant and Tauchen (1996). 
But SMD estimators replace the asymptotic binding function i/j(9q) = limr^oo E[^(#o)] by a finite 
sample analog using Monte-Carlo simulations. While the SMD is motivated with the estimation 


of complex models in mind, Gourieroux et al. (1999) shows that simulation estimation has a bias 
reduction effect like the bootstrap. Hence in the econometrics literature, SMD estimators are used 


even when the likelihood is tractable, as in Gourieroux et al. (2010). 
The steps for implementing the SMD are as follows: 


0 For s = 1,..., S, draw e s = (e®,..., £j)' from F e . These are innovations to the structural 
model that will be held fixed during iterations. 

1 Given 9 , repeat for s = 1,... S: 

a Use (e s , 9) and the model to simulate data y s = (yf,..., y^)' ■ 
b Compute the auxiliary statistics il> s (9 ) using simulated data y s . 

2 Compute: g s (9) = y) - § V(9). Minimize J s {9) = g s {9)'Wg s {9). 


The SMD is the 9 that makes Js(9) smaller than the tolerance specified for the numerical optimizer. 
In the exactly identified case, the tolerance can be made as small as machine precision permits. 
When -0 is a vector of unconditional moments, the SMM estimator of Duffie and Singleton (1993) is 
obtained. When i/j are parameters of an auxiliary model, we have the ‘indirect inference’ estimator 


of Gourieroux et al. (1993). These are Wald-test based SMD estimators in the terminology of Smith 


(2008). When ?/; is the score function associated with the likelihood of the auxiliary model, we have 


the EMM estimator of Gallant and Tauchen (1996), which can also be thought of as an LM-test 
based SMD. If ^ is the likelihood of the auxiliary model, Js(9) can be interpreted as a likelihood 
ratio and we have a LR-test based SMD. Gourieroux and Monfort (1996) provides a framework that 
unifies these three approaches to SMD estimation. Nickl and Potscher (2010) shows that an SMD 
based on non-parametrically estimated auxiliary statistics can have asymptotic variance equal to 
the Cramer-Rao bound if the tuning parameters are optimally chosen. 

The Wald, LM, and LR based SMD estimators minimize a weighted L 2 distance between the 
data and the model as summarized by auxiliary statistics. Creel and Kristensen (2013) considers a 
class of estimators that minimize the Kullback-Leibler distance between the model and the data. 
Within this class, their MIL estimator maximizes an ‘indirect likelihood’, defined as the likelihood 
of the auxiliary statistics. Their BIL estimator uses Bayesian computations to approximate the 
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mode of the indirect likelihood. In practice, the indirect likelihood is unknown. A SBIL estimator 
then uses Bayesian computations in conjunction with simulations and non-parametric estimation. 


The latter step estimates the indirect likelihood by kernel smoothing of the simulated data. Gao 


and Hong (2014) shows that using local linear regressions instead of kernel estimation can reduce 


the variance and the bias. These SBIL estimators actually correspond to two implementations of 


ABC considered in Beaumont et al. (2002). The SBIL provides a link between ABC and the SMD 


to the extent that the SBIL can be seen as a Kullback-Leibler distance-based SMD estimator. In 
the sequel, we take the more conventional 1_2 definition of SMD as given above. 

3 Approximate Bayesian Computations 

The ABC literature often credits Donald Rubin to be the first to consider the possibility of esti¬ 


mating the posterior distribution when the likelihood is intractable. Diggle and Gratton (1984) 


proposes to approximate the likelihood by simulating the model at each point on a parameter grid 
and appears to be the first implementation of simulation estimation for models with intractable like¬ 
lihoods. Subsequent developments adapted the idea to conduct posterior inference, giving the prior 


an explicit role. The first ABC algorithm was implemented by Tavare et al. (1997) and Pritchard 


et al. (1996) to study population genetics. Their Accept/Reject algorithm is as follows: (i) draw 9 b 
from the prior distribution vr(0), (ii) simulate data using the model under 9 b (iii) accept 9 b if the 
auxiliary statistics computed using the simulated data are close to if). As in the SMD literature, the 


auxiliary statistics can be parameters of a regression or unconditional sample moments. Heggland 


and Frigessi (2004), Drovandi et al. (2011, 2015) use simulated auxiliary statistics. 


Since simulating from a non-informative prior distribution is inefficient, subsequent work sug¬ 
gests to replace the rejection sampler by one that takes into account the features of the posterior 
distribution. The general idea is to set as a target the intractable posterior density 

PABci 6 IY0 « n(0)L(ti>\0) 

and approximate it using Monte-Carlo methods. Some algorithms are motivated from the per¬ 
spective of non-parametric density estimation, while others aim to improve properties of the 
Markov chainj^] The main idea is, however, using data augmentation to consider the joint den¬ 
sity pabc(^,x\i/j) oc L(ip\x,9)L(x\9)7r(9), putting more weight on the draws with x close to -i/L 
When x = 'll), Lfyl'i/j, 9) is a constant, pabc(@,' 1 1’\4’) oc L(iJ>\0)Tr(Q) and the target posterior is 
recovered. If ?/> are sufficient statistics, one recovers the posterior distribution associated with the 
intractable likelihood L(9\y), not just an approximation. 


4 Recent surveys on ABC can be found in 
al. (2015 2011[) for differences amongst ABC estimators. 


Marin et al. 1 

2012) 

Blum et al. 

(2013 

i among others. See 

Drovandi et 
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To better understand the ABC idea and its implementation, we will write y b instead of y b (e b , 0 b ) 
and ^ instead of ij} b {y b (£ b , 9 b )) to simplify notation. Let ip\ 9) > 0 be a kernel function that 

weighs deviations between if) and i/j b over a window of width 6. Suppose we keep only the draws 
that satisfy i[i b = ^ and hence (5 = 0. Note that , ip\6) = 1 if i/j = ip b for any choice of the 

kernel function. Once the likelihood of interest 



L(x\9)Kq(x, i/)\ 9)dx 


is available, moments and quantiles can be computed. In particular, for any measurable function 
<p whose expectation exists, we have: 


E 


<p{9) IV> = 4> b 


J & ip(9)n(9)L('i/j\9)d9 f e f <p(0)7r(9)L(x\9)Ko(x, ^\9)dxd9 

Jq 7r(9)L(i/;\9)d9 /e / t^{0)L{x\6)Kq{x, ip\9)dxd9 


Since ij? > \9 b ~ L(.\9 b ), the expectation can be approximated by averaging over draws from L(-|0^). 
More generally, draws can be taken from an importance density (/(•). In particular, 


E 






The importance weights are then 


iuq oc Koffi, ip\0) 


<o b ) 

q(9 b ) ' 


By a law of large number, E <p(0) \ip —> E <p(9) \ip as B —> oo. 

There is, however, a caveat. When ^ has continuous support, ip b = is an event of measure 
zero. Replacing Kq with where 6 is close to zero yields the approximation: 


E 


V>{9) = W 


Jq f (p{9)tt(9)L(x\9)Ks(x, il>\d)dxd9 


Jq f 7r(9)L(x\9')Ks(x, ’ip\9)dxd9 
Since K^(-) is a kernel function, consistency of the non-parametric estimator for the conditional 


expectation of <p(0) follows from, for example, Pagan and Ullah (1999). This is the approach 
considered in Beaumont et al.| (2002), Creel and Kristensen (2013) and Gao and Hong (2014). The 
case of a rectangular kernel Ks(i/j,'ip b ) = ly^_^ 6 m <( 5 corresponds to the ABC algorithm proposed in 


Marjoram et al. (2003). This is the first ABC algorithm that exploits MCMC sampling. Hence we 


refer to it as MCMC-ABC. Our analysis to follow is based on this algorithm. Accordingly, we now 
explore it in more detail. 
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Algorithm MCMC-ABC Let q(-) be the proposal distribution. For b = 1,..., B with 9° given, 

1 Generate 9 b+1 ~ q{9 b+1 \9 h ). 

2 Draw e b+1 from F e and simulate data y b+1 . Compute i/j b+1 . 

3 Accept 9 b+l with probability pABc{6 b , 9 b+1 ) and set it equal to 9 b with probability 1 — 
PABc(O b , 9 b+1 ) where 


PABc(v , & 


b flft+l'i 


= min (I, 


Tr(9 b+1 )q{6 b \6 b+1 ] 

T ’^ b+l ii < 5 


(3) 


As with all ABC algorithms, the success of the MCMC-ABC lies in augmenting the posterior with 
simulated data 'tj?, ie. p* ABC (0 b ,i/j b I VO °c L(ip\0 b , 'ip b )L(if) b \9 b )ir(9 b ). The joint posterior distribution 
that the MCMC-ABC would like to target is 


pabc K, W) « A0 b )L^ b \e b )i,,r b _r,, 


=0 


since integrating out e b would yield P*abc(® I VO- But it would not be possible to generate draws 
such that 11 ifjh — ip\\ equals zero exactly. Hence as a compromise, the MCMC-ABC algorithm allows 
5 > 0 and targets 


Pabc (/, WJ vr (0 b )L(^\0 b )I^ b _^ s . 

The adequacy of Pabc as an approximation of Pabc a function of the tuning parameter 6. 


To understand why this algorithm works, we follow the argument in Sisson and Fan (2011). 
If the initial draw 9 1 satisfies \\i/j — i/j 1 || < 5, then all subsequent b > 1 draws are such that 


In 


l j b —'ip\\<8 


= 1 by construction. Furthermore, since we draw 9 b+1 and then independently simulate 
data i p b+1 , the proposal distribution becomes q(9 b+1 , ip b+1 \9 b ) = q{9 b+1 \9 b )L{^ b+1 \9 b+1 ). The two 
observations together imply that 


ir{0 b+1 )q(9 b \9 b+1 ) n(9 b+1 )q(9 b \9 b+1 ) L{^ +1 \9 b+1 ) L($>\0 b ) 

H-^W<S n{e b )q{e b + i|0 b ) " I||^||< 5 'n{Q b )q(6 b +'\d b ) L$>\9 b ) L(^ b + 1 \6 b + 1 ) 

_ I liV-^+ 1 ll<* 7T(9 b+1 )L(^ b+1 \9 b+1 ) q(9 b \9 b+1 )L(ij?\e b ) 

- <5 ir(0 b )L$ b \e b ) q(0 b +i\0 b )L(i, b +i\0 b +i) 

_PABc( db+1 ^ b+1 H) q(0 b ,ft>\0 b+1 ) 

p 5 ABc( 9b ^ b $) q(o b+ \$ b+1 \o b ) 


The last equality shows that the acceptance ratio is in fact the ratio of two ABC posteriors times 
the ratio of the proposal distribution. Hence the MCMC-ABC effectively targets the joint posterior 
distribution p^abc- 











3.1 The Reverse Sampler 


Thus far, we have seen that the SMD estimator is the 9 that makes \\ip — ^ ' i /’ s ($)ll no larger 

than the tolerance of the numerical optimizer. We have also seen that the feasible MCMC-ABC 
accepts draws 6 b satisfying \\ip — 'ip b (9 b )\\ < 6 with 6 > 0. To view the MCMC-ABC from a different 
perspective, suppose that setting <5 = 0 was possible. Then each accepted draw 9 b would satisfy: 

j b (9 b ) = f 

For fixed e b and assuming that the mapping ip b : 9 —> ip b (9) is continuously differentiable and 
one-to-one, the above statement is equivalent to: 

9 h = argmin e (ip b (0) — ip^J (^p b {9) — ip'j . 

Hence each accepted 9 b is the solution to a SMD problem with 5=1. Next, suppose that instead 
of drawing 9 b from a proposal distribution, we draw e b and solve for 9 b as above. Since the mapping 
'iph is invertible by assumption, a change of variable yields the relation between the distribution of 
ip b and 8 b . In particular, the joint density, say h(9 b ,e b ), is related to the joint density L(ip b (9 b ), e b ) 
via the determinant of the Jacobian \'ipg{9 b )\ as follows: 

h(9 b ,s b $) = \$(9 b )\L$ b (8 b ),e b $). 

Multiplying the quantity on the right-hand-side by w b (9 b ) = TT(9 b )\'ipg(9 b )\~ 1 yields n(9 b )L('il>, e b \9 b ) 
since ip b (9 b ) = ip and the mapping from 9 b to tjj b (9 b ) is one-to-one. This suggests that if we solve 
the SMD problem B times each with 5 = 1, re-weighting each of the B solutions by w b (9 b ) would 
give the target the joint posterior IVO after integrating out e b . 

Algorithm RS 

1 For b = 1,..., B and a given 9, 

i Draw e b from F e and simulate data y b using 9. Compute ip b (9) from y b . 

ii Let 9 b = argmin e J b {9), J b (9) = (ip — ip b (9))’W{'ip — ip b {9)). 

iii Compute the Jacobian ip b e (9 b ) and its determinant \ip b e (9 b )\. Let w b (9 b ) = TT(9 b )\ip b g (9 b )\~ l . 

2 Compute the posterior mean 9rs — J2b=iW b {9 b )9 b where w b (9 b ) = B w ^ J . . 

Z^ c /= 1 wC (v C ) 

The RS has the optimization aspect of SMD as well as the sampling aspect of the MCMC-ABC. We 
call the RS the reverse sampler for two reasons. First, typical Bayesian estimation starts with an 


9 



evaluation of the prior probabilities. The RS terminates with evaluation of the prior. Furthermore, 
we use the SMD estimates to reverse engineer the posterior distribution. 

Consistency of each RS solution (ie. 9 b ) is built on the fact that the SMD is consistent even 
with S = 1. The RS estimate is thus an average of a sequence of SMD modes. In contrast, the SMD 
is the mode of an objective function defined from a weighted average of the simulated auxiliary 
statistics. Optimization effectively allows 5 to be as close to zero as machine precision permits. 
This puts the joint posterior distribution as close to the infeasible target as possible, but has the 
consequence of shifting the distribution from (y b ,if b ) to (y b ,9 b ). Hence a change of variable is 
required. The importance weight depends on Jacobian matrix, making the RS an optimization 
based importance sampler. 

Lemma 1 Suppose that if : 9 —» is one-to-one and is full column rank. The poste¬ 

rior distribution produced by the reverse sampler converges to the infeasible posterior distribution 
P*ABc( e IVO as B ^oo. 


By convergence, we mean that for any measurable function <p(6) such that the expectation exists, 
a law of large number implies that J2b=i ■ I n general, w b {6 b ) / -g. 

Results for moments computed from the RS draws can be interpreted as draws from p^ BC , th e 
posterior distribution had the likelihood p{if\9) been available. 

We mainly use the RS as a conceptual framework to understand the differences between the 
MCMC-ABC and SMD in what follows. While it is not intended to compete with existing likelihood- 
free estimators, it can nonetheless be useful in situations when numerical optimization of the 


auxiliary model is easy. Properties of the RS are further analyzed in Forneron and Ng (2015). 


Meeds and Welling (2015) independently proposes an algorithm similar to the RS, and shows how 


it can be implemented efficiently by embarrassingly parallel methods. 


4 Quasi-Bayes Estimators 


The GMM objective function J(9) defined in ([2]) is not a proper density. Noting that exp(— J{9)) is 
the kernel of the Gaussian density, Jiang and Turnbull ([2004) defines an indirect likelihood (distinct 
from the one defined in Creel and Kristensen (2013)) as 


L IND (6\fy = ^=\Y,\ 1 exp(—J(0)). 

V 27T 

Associated with the indirect likelihood is the indirect score, indirect Hessian, and a generalized 
information matrix equality, just like a conventional likelihood. Though the indirect likelihood is 
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not a proper density, its maximizer has properties analogous to the maximum likelihood estimator 
provided E[g t (6o)\ =0. 


In Chernozhukov and Hong (2003), the authors observe that extremum estimators can be dif¬ 
ficult to compute if the objective function is highly non-convex, especially when the dimension 
of the parameter space is large. These difficulties can be alleviated by using Bayesian computa¬ 


tional tools, but this is not possible when the objective function is not a likelihood. Chernozhukov 


and Hong (2003) take an exponential of —J(9), as in Jiang and Turnbull (2004), but then com¬ 


bines exp(— J(9)) with a prior density n(9) to produce a quasi-posterior density. Chernozhukov 
and Hong initially termed their estimator ‘Quasi-Bayes’ because exp(— J(9)) is not a standard 
likelihood. They settled on the term ‘Laplace-type estimator’ (LT), so-called because Laplace sug¬ 


gested to approximate a smooth pdf with a well defined peak by a normal density, see Tierney and 


Kadane (1986). If ir(9) is strictly positive and continuous over a compact parameter space 0, the 

exp(— J(9))tt(9) 


‘quasi-posterior’ LT distribution 

Plt(9\y) = 


L exp(— J(9)ir(9))d9 


oc exp(— J(9))ir(9) 


(4) 


is proper. The LT posterior mean is thus well-defined even when the prior may not be proper. As 


discussed in Chernozhukov and Hong (2003), one can think of the LT under a flat prior as using 
simulated annealing to maximize exp(— J(9)) and setting the cooling parameter t to 1. Frequentist 
inference is asymptotically valid because as the sample size increases, the prior is dominated by the 
pseudo likelihood which, by the Laplace approximation, is asymptotically normal)^] 

In practice, the LT posterior distribution is targeted using MCMC methods. Upon replacing 
the likelihood L(9) by exp(— J(9)), the MH acceptance probability is 

exp (— J{9 )) 7T (i?) q (9 b \ 9) 


PLT{9 b ,d) = min (- 


.exp(— J{9 b ))ir(9 b )q(d\9 b ) ’ 

The quasi-posterior mean is 6lt = jj @ b where each 9 b is a draw from plt{9\y) ■ Chernozhukov 
and Hong suggest to exploit the fact that the quasi-posterior mean is much easier to compute than 
the mode and that, under regularity conditions, the two are first order equivalent. In practice, 
the weighting matrix can be based on some preliminary estimate of 0, or estimated simultaneously 
with 9. In exactly identified models, it is well known that the MD estimates do not depend on the 
choice of W. This continues to be the case for the LT posterior mode 9lt- However, the posterior 
mean is affected by the choice of the weighting matrix even in the just-identified casej^] 


5 For loss function d(-), the LT estimator is = argmin e J 0 d(9 — d)pLT(d\y)dd. If d(-) is quadratic, the 

posterior mean minimizes quasi-posterior risk. 

t 


Kormiltsina and Nekipelov ([2014) suggests to scale the objective function to improve coverage of the confidence 


intervals. 
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The LT estimator is built on the validity of the asymptotic normal approximation in the second 


order expansion of the objective function. Nekipelov and Kormilitsina (2015) shows that in small 
samples, this approximation can be poor so that the LT posterior mean may differ significantly 
from the extremum estimate that it is meant to approximate. To see the problem in a different 
light, we again take an optimization view. Specifically, the asymptotic distribution v / T(?/i(0o) — 
E(0 O )) = Aoo^q) suggests to use 


v> b ( o) ~ m + 


ASc(flo) 

Vt 


where A^ o (0o) ~ A/"(O,£(0)). Given a draw of A^, there will exist a 6 b such that (V ,b (9 ) — 
^)'W{^ b {e) — i/j) is minimized. In the exactly identified case, this discrepancy can be driven 
to zero up to machine precision. Hence we can define 


6 b = argmin e) ||^ 6 (0) — ip\\. 


Arguments analogous to the RS suggests the following will produce draws of 6 from plt (@ |y)- 

1 For 6=1,.. .B: 

i Draw A b oo (0o) and define ip b (0) = ip(9) + . 

ii Solve for 6 b such that = i/j (up to machine precision). 

iii Compute w b (0 b ) = |V^(# b )| -1 7r(0 fc ). 

2 0 LT = Y1w b (8 b )6 b 1 where w b = . 

2^c= l w ) 

Seen from an optimization perspective, the LT is a weighted average of MD modes with the de¬ 
terminant of the Jacobian matrix as importance weight, similar to the RS. It differs from the RS 
in that the Jacobian here is computed from the asymptotic binding function and the draws 

are based on the asymptotic normality of i/j. As such, simulation of the structural model is not 
required. 


4.1 The SLT 


When tjj(9) is not analytically tractable, a natural modification is to approximate it by simulations 
as in the SMD. This is the approach taken in Lise et al. (2015). We refer to this estimator as the 
Simulated Laplace-type estimator, or SLT. The steps are as follows: 


0 Draw structural innovations £ s = (sf,, £j)' from F e . These are held fixed across iterations. 
1 For b = 1,..., B, draw i? from q(i}\0 b ). 
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i. For s = 1,... S: use ($, e s ) and the model to simulate data y s = (yf,..., y^Y- Compute 

using y s . 

ii. Form = g s (tfyWg s (tf), where = $(y) - ^ J2s=i $*(&)■ 

iii. Set 6 b+l = d with probability psLT(O b i$), else reset $ to 9 b with probability 1 — pslt 
where the acceptance probability is: 


PSLT(O b ,' i?) = min 


/ exp(-J,g(#))7r(#)g(0 fe |#) 
V exp (— Js ( O' b )) 7r ( O' b ) q (D 1 9 ' b ) 



2 Compute 0 b SLT = 


The SLT algorithm has two loops, one using S simulations for each b to approximate the asymptotic 
binding function, and one using B draws to approximate the ‘quasi-posterior’ SLT distribution 

exp(-Js(0))7r(0) 


PSLT(0\y ,£V 


.,£ j = 


J©exp(— J s (0))ir(0)d0 


oc exp(— Js(8))ir(8) 


(5) 


The above SLT algorithm has features of SMD, ABC, and LT. Like SMD and ABC, SLT 
also requires simulations of the full model. As a referee pointed out, the SLT resembles the 
ABC algorithm when used with a Gaussian kernel. But exp(—Jg(0)) is not a proper density 
and pslt{9 |y, e 1 ,... ,e 5 ) is not a conventional likelihood-based posterior distribution. While the 
SLT targets the pseudo likelihood, ABC algorithms target the proper but intractable likelihood. 
Furthermore, the asymptotic distribution of is known from a frequentist perspective. In ABC 
estimation, lack of knowledge of the likelihood of motivates the Bayesian computation. 

The optimization implementation of SLT presents a clear contrast with the ABC. 


1 Given e s = (ef,..., e^)' for s = 1,... S, repeat for b = 1,... B: 

i Draw ^(0) = T ^ =1 $ s (9) + 

ii Solve for 9 b such that ip b (O b ) = (up to machine precision). 

iii Compute w b (9 b ) = |^(0 b )|- 1 vr(0 fc ). 

2. 0 slt = J2w b {9 b )9 b , where w b = 

2^ c =l w ) 


While the SLT is a weighted average of SMD modes, the draws of ^r(0) are taken from the (fre¬ 
quentist) asymptotic distribution of tp instead of solving the model at each b. Gao and Hong (2014) 
use a similar idea to make draws of what we refer to as g(0) in their extension of the BIL estimator 


of Creel and Kristensen (2013) to non-separable models. 

The SMD, RS, ABC, and SLT all require specification and simulation of the full model. At a 
practical level, the innovations e 1 ,.. ,e s used in SMD and SLT are only drawn from F e once and 
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held fixed across iterations. Equivalently, the seed of the random number generator is fixed so that 
the only difference in successive iterations is due to change in the parameters to be estimated. In 
contrast, ABC draws new innovations from F e each time a 9 h+1 is proposed. We need to simulate 
B sets of innovations of length T, not counting those used in draws that are rejected, and B is 
generally much bigger than S. The SLT takes B draws from an asymptotic distribution of ijj. Hence 
even though some aspects of the algorithms considered seem similar, there are subtle differences. 


5 Properties of the Estimators 


This section studies the finite sample properties of the various estimators. Our goal is to compare 
the SMD with the RS, and by implication, the infeasible MCMC-ABC. To do so in a tractable way, 
we only consider the expansion up to order ^. As a point of reference, we first note that under 
assumptions in 


Rilstone et al. 


(1996); 


Bao and Ullah 


(2007), 9ml admits a second order expansion 


n a , A Ml{9q) Cml(9 0 ) 1 

9ml = 9 o + — -j= - h- — - + Op(-). 

where Aml{9q) is a mean-zero asymptotically normal random vector and Cml{9o) depends on the 
curvature of the likelihood. These terms are defined as 


Aml( 9 0 ) = ~E[iee{0 Q )] Zs(9 0 ) 


Cml(9q) = E[— £ee(9o)] 


-i 


1 K 

Zh(9q)Zs(9o) - ~'^2(-iee0j{9o))Zs(9o)Zsj{9o) 


3=1 


(6a) 

(6b) 


where the normalized score ^£$(9o) and centered hessian -^(Iqq{9q) — E[£ 0 g( 0 o)]) converge in 
distribution to the normal vectors Z$ and Zb respectively. The order j bias is large when Fisher 
information is low. 

Classical Bayesian estimators are likelihood based. Hence the posterior mode 9bc exhibits a 
bias similar to that of 9ml■ However, the prior 7 r(9) can be thought of as a constraint, or penalty 


since the posterior mode which maximizes logp(#|y) = logL(0|y) + log7r(0). Furthermore, Kass 


et al. (1990) shows that the posterior mean deviates from the posterior mode by a term that 


depends on the second derivatives of the log-likelihood. Accordingly, there are three sources of bias 
in the posterior mean 9bc : a likelihood component, a prior component, and a component from 
approximating the mode by the mean. Hence 


p -a , A ml(9o) 1 

»BC ~ h + f 


Cbc(@o) H o) + Cb C (® o) 

7RP o) 


, 1, 

+ 


Note that the prior component is under the control of the researcher. 

In what follows, we will show that posterior means based on auxiliary statistics ip generically 
have the above representation, but the composition of the terms differ. 
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5.1 Properties of Osmd 

Minimum distance estimators depend on auxiliary statistics ip. Its properties has been analyzed 


m 


Newey and Smith (2004a Section 4.2) within an empirical-likelihood framework. To facilitate 


subsequent analysis, we follow Gourieroux and Monfort (1996, Ch.4.4) and directly expand around 
ip, under the assumption that it admits a second-order expansion. In particular, since ip is \/T 
consistent for ip(0o), ip has expansion 


?_ ,,a ^ , MOo) , C(0o) , ,1, 
ip — ip(6 0 ) H -j=- -\ — I- Op(-). 

It is then straightforward to show that the minimum distance estimator Omd has expansion 

i -i 


Amd{9q) = i>6{9 o) A(0 O ) 


Cmd{0q) = 4’e(9o) 


-i 


K 


Wo) - - y^ j 'ipe,e i (9o)AMD{0o)A M D,j(0o) 


3 = 1 


(7) 

(8a) 

(8b) 


The bias in Omd depends on the curvature of the binding function and the bias in the auxiliary 
statistic ip, <D(#o)- Then following Gourieroux et al. (1999), we can analyze the SMD as follows. In 


view of Q, we have, for each s: 


7ata . A s (0) <D S (6>) .1. 

ip s (0) = ^e) + -A^ + ^A + o p {-). 


The estimator Osmd satisfies ip = ^ X^=i ip s (0SMD) and has expansion Osmd = 0q+ Asm Jd °' -\- 
Csmd(Sq ) _|_ 0p (T). Plugging in the Edgeworth expansions give: 

S r 

. T >=- 


i (a \ | MOo) C(<9 0 ) ,1. 1 

wm + —/=- + —+ Op(-) = w 


Vt 


s =1 L 


uq \ i A s (0 S md) C s (0 S md) M , 

tyipSMDj + -^- + Op(-) 


Expanding ip{0sMD ) and A s {0smd) around 6q and equating terms in the expansion of Osmd , 

i-l / ,5 


Asmd(0o ) = 

Csmd{0q ) = 


ipeiOo) 


(a(0o)-IX>'(0o) 


fpe(0o) 


-l 


S=1 

S 


(9a) 


5 S 

cm ~~y c s m -(-£ A s e m)A SMD m 


S=1 


5=1 


(9b) 


tpem 


-1 K 


Y^mAsMDmAsMD.m- 

3 = 1 


The first order term can be written as Asmd = Amd + ^[^Pe{0o)\ 1 J2b=i A b (0o), the last term has 
variance of order 1 /B which accounts for simulation noise. Note also that E ^ ^f=i 
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E[<D(#o)]- Hence, unlike the MD, E[Csmd(Oo)\ does not depend on the bias C(#o) in the auxiliary 
statistic. In the special case when ip is & consistent estimator of 8 q, ipo( 6 o) is the identity map and 
the term involving ipe 0 j(Oo) drops out. Consequently, the SMD has no bias of order y when S —> oo 
and ip(0) = 9. In general, the bias of Qsmd depends on the curvature of the binding function as 

-i-i K 


^[CsMd(0q)\ 


S’—>oo 1 


ipe(9o) 


o)E 

3 = 1 


djlffl (6o)AMD,j (Oo) 


( 10 ) 


This is an improvement over Omd because as seen from (8b), 

i-i , r -i-l K 


^[Cmd{9 o)] = 


ipe{9o) 


C(8 0 ) - \ 


ipe(9o) 


o)E 

j =i 


Amd(0o)Amdj(0o) 


( 11 ) 


The bias in Omd has an additional term in C(#o)- 


5.2 Properties of d RS 

The ABC literature has focused on the convergence properties of the algorithms and less on the 


theoretical properties of the estimates. Dean et al. (2011) establishes consistency of the ABC in 


the case of hidden Markov models. The analysis considers a scheme so that maximum likelihood 
estimation based on the ABC algorithm is equivalent to exact inference under the perturbed hidden 


Markov scheme. The authors find that the asymptotic bias depends on the ABC tolerance <5. Calvet 


and Czellar (2015) provides an upper bound for the mean-squared error of their ABC filter and 


studies how the choice of the bandwidth affects properties of the filter. Under high level conditions 


and adopting the empirical likelihood framework of Newey and Smith (2004b), Creel and Kristensen 


(2013) shows that the infeasible BIL is second order equivalent to the MIL after bias adjustments, 


while MIL is in turn first order equivalent to the continuously updated GMM. The feasible SBIL 
(which is also an ABC estimator) has additional errors compared to the BIL due to simulation noise 
and kernel smoothing, but these errors vanish as S —> oo for an appropriately chosen bandwidth. 


Gao and Hong (2014) shows that local-regressions have better variance properties compared to 


kernel estimations of the indirect likelihood. Both studies find that a large number of simulations 
are needed to control for the stochastic approximation error. 


The results of Creel and Kristensen (2013) and Gao and Hong (2014) shed light on the rela¬ 


tionship between SMD and ABC from the perspective of non-parametric estimation. The difficulty 
in analyzing ABC algorithms comes in the fact that simulation introduces additional randomness 
which interacts with smoothing bias induced by non-parametric estimation of the density. These 
effect are difficult to make precise. We present an optimization/importance sampling perspective 
by appealing to an implication of Proposition [TJ (i), which is that 9 R g is the weighted average of a 
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sequence of SMD modes. Analysis of the weights w b (9 b ) requires an the expansion of 6 b ) around 

ipg(9 o). From such a analysis, shown in the Appendix, we find that 


«R S = + + 

6=1 V-L 


where 


4rs(0o) = 
Crs(6 o) = 


B 

1 

B 


E = 

6=1 

B 

E Es(^o) + 


6=1 


^e(do) 

7Tg(6> 0 ) 

7t(6» 0 ) 


-1 


R 




6=1 


( 12 a) 


R 


B 




6=1 


+ C Y fis , (^o)-(12b) 


Proposition 1 Let if {9) be the auxiliary statistic that admits the expansion as in |?]) and suppose 
that the prior tt( 9) is positive and continuously differentiable around Oq. Then E[Ar5(0o)] = 0 but 
E[Cfts( 0 o)] / 0 /or an arbitrary choice of prior. 

The SMD and RS are first order equivalent, but 9rs has an order j- bias. The bias, given by 
Crs(9q ), has three components. The Cr S (9q) term (defined in Appendix A) can be traced directly 
to the weights, or to the interaction of the weights with the prior, and is a function of Ars{9$). 
Some but not all the terms vanish as B —> 00 . The second term will be zero if a uniform prior is 


chosen since ttq = 0. A similar result is obtained in Creel and Kristensen (2013). The first term is 


ff E ^Rs(. 9 0 ) 


V’e(do) 


^E (<C(0o) -C 6 (0 O ) ^ lit^(9°) A Rs(0o)A b RSj (d o ) - A b (e 0 )A b RS (d 0 ) 

b—1 ' j =1 


The term <D($o) — 5 XEi C fe ($o) is exactly the same as in Csmd(9o). The middle term involves 
ipee^d 0 ) and is zero if if (9) = 9. But because the summation is over 9 b instead of if s , 


-^K(9 0 )A b RS (9 0 ) —> E[Ag(do)A5j 5 (do)] / 0. 


As a consequence EfC^s/do)] f 0 even when if(9) = 9. In contrast, E[ Csmd(9o)\ 


if{9) = 9 as seen from (10). The reason is that the comparable term in Csmd(9o ) is 


5 E ^e(9o)^j A S md(9q) E°° E[A s g (9 0 )]A S MD(9o) = 0 . 


0 when 


The difference boils down to the fact that the SMD is the mode of the average over simulated 
auxiliary statistics, while the RS is a weighted average over the modes. As will be seen below, this 


17 















difference is also present in the LT and SLT and comes from averaging over 9 b . The result is based 
on fixing 5 at zero and holds for any B. Proposition [I] implies that the ideal MCMC-ABC with 
5 = 0 also has a non-negligible second-order bias. 

In theory, the order ^ bias can be removed if n(9) can be found to put the right hand side of 
C rs (9o) defined in (12b) to zero. Then 6 rs w ih be second order equivalent to SMD when iJj(6 ) = 6 
and may have a smaller bias than SMD when ^ 9 since SMD has a non-removable second order 
bias in that case. That the choice of prior will have bias implications for likelihood-free estimation 


echoes the findings in the parametric likelihood setting. Arellano and Bonhomme (2009) shows in 
the context of non-linear panel data models that the first-order bias in Bayesian estimators can be 


eliminated with a particular prior on the individual effects. Bester and Hansen 2006^ also shows 
that in the estimation of parametric likelihood models, the order ,j, bias in the posterior mode 
and mean can be removed using objective Bayesian priors. They suggest to replace the population 
quantities in a differential equation with sample estimates. Finding the bias-reducing prior for the 
RS involves solving the differential equation: 


0 — E[C^ 5 (#o)] + 


^e(Oo) 

tt(9 0 ) 


®[(^rs(^o) — -4rs(0o))Ars(<9o)] + E[C' rs ( 0 o )] 


which has the additional dependence on n in C¥ s (9o, 7t(9q)) that is not present i 


m 


Bester and 


Hansen (2006). A closed-form solution is available only for simple examples as we will see Section 


4.1 below. For realistic problems, how to find and implement the bias-reducing prior is not a trivial 


problem. A natural starting point is the plug-in procedure of Bester and Hansen (2006) but little is 
known about its finite sample properties even in the likelihood setting for which it was developed. 

Finally, this section has studied the RS, which is the best that the MCMC-ABC can achieve 
in terms of 5. This enables us to make a comparison with the SMD holding the same l _2 distance 
between and ip(9) at zero by machine precision. However, the MCMC-ABC algorithm with 6 > 0 
will not produce draws with the same distribution as the RS. To see the problem, suppose that the 
RS draws are obtained by stopping the optimizer before \\ij) — 'ip(9 b )\\ reaches the tolerance guided 
by machine precision. This is analogous to equating \^{9 b ) to the pseudo estimate + 5. Inverting 
the binding function will yield an estimate of 9 that depends on the random 5 in an intractable way. 
The RS estimate will thus have an additional bias from 5^0. By implication, the MCMC-ABC 
with 5 > 0 will be second order equivalent to the SMD only after a bias adjustment even when 
= 9. 


5.3 The Properties of LT and SLT 

The mode of exp(— J(9))tt(9) will inherit the properties of a MD estimator. However, the quasi¬ 
posterior mean has two additional sources of bias, one arising from the prior, and from approxi- 
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mating the mode by the mean. The optimization view of Olt facilitates an understanding of these 
effects. As shown in Appendix B, each draw 0 b LT has expansion terms 

A(0 o )-A^ 


Clt(^o) = 


ipo(0o) 


$e{Qo) 

1 


K 


— 2 (^o)(^Lr(^o)^LT,j(^o) - ■^■TO,6»(^o)Ai T (0o)J 


Even though the LT has the same objective function as MD, simulation noise enters both A b LT (9o ) 
and C b LT (9o). Compared to the extremum estimate 9md , we see that Alt = g Ylb=i / 

Amd(9q) and Clt(9o ) / Cmd{9q). Although Clt{9q ) has the same terms as they are 

different because the LT uses the asymptotic binding function, and hence A b LT (9o) / A^ s (0o)- 
A similar stochastic expansion of each 9 b SLT gives: 

S \ 


i-i 


A b SLT (9 0 ) = MOo) ' A(0 o )-^X>^ o )-A^ o ) 


S 


s =1 

S 


K 


Cslt(@o) — V’e^o) 


V(9o) ~ o _ o Vfyfl,- {Oo)A b S LTA b s LT 'j 


S =1 


J = 1 


i’eiQo) ( o ^(Ag(0p) + A^ o g (9 0 ))A b SLT (9o) J 


5 


S=1 


J 


Following the same argument as in the RS, an optimally chosen prior can reduce bias, at least in 
theory, but finding this prior will not be a trivial task. Overall, the SLT has features of the RS (bias 
does not depend on C(#o)) and the LT (dependence on A^) but is different from both. Because 
the SLT uses simulations to approximate the binding function x/j(9 ), E[C($o) — ^ Sf=i C s (#o)] = 0- 
The improvement over the LT is analogous to the improvement of SMD over MD. However, the 
o) is affected by estimation of the binding function (the term with superscript s ) and of 
the quasi-posterior density (the terms with superscript b ). This results in simulation noise with 
variance of order 1/S plus another of order 1/B. Note also that the SLT bias has an additional 
term 


6=1 


s=l 


6=1 


The main difference with the RS is that A" is replaced with Aj^. For S = oo this term matches 
that of the LT. 


5.4 Overview 

We started this section by noting that the Bayesian posterior mean has two components in its bias, 
one arising from the prior which acts like a penalty on the objective function, and one arising from 
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the approximation of the mean for the mode. We are now in a position to use the results in the 
foregoing subsections to show that for d=(MD, SMD, RS, LT) and SLT and D = (RS,LT,SLT) 
these estimators can be represented as 


9d — 9 o + 


A d (9 0 ) . Ca(Oo) 1 deD 


Vt 


+ 


+ 


?T 0(0 o) 


cZ(e 0 )) + c?(e 0 ) 


t L^o) 


+ °p(t 


(13) 


where with A b d (9 0 ) = [ipe{9 0 )\ 1 (a( 0 o ) - A^(0 O )), 


A d (9 0 ) 


C d (9 0 ) 


C%{6o) 


[M9 0 

15 6=1 

1 K 

[MOo)}- 1 ^) - <C d (9 0 ) - )A b d (9 0 )A b dJ (9 0 ) - A %A b d (9 0 )) 

3 = 1 

1 B 

gD A 5W-- 4 “( <, »>> A 5( e »>- 

6=1 


The term C d (9o) is a bias directly due to the prior. The term C^ 1 {9q). defined in the Appendix, 
depends on A d (9o), the curvature of the binding function, and their interaction with the prior. 
Hence at a general level, the estimators can be distinguished by whether or not Bayesian compu¬ 
tation tools are used, as the indicator function is null only for the two frequentist estimators (MD 
and SMD). More fundamentally, the estimators differ because of A d {9o) and C d (9o), which in turn 


depend 

on A b d (9 0 ) and C d (9 0 ). 

We compactly summarize the differences as follows 


d 


<C d (9 0 ) 

var(A d (0 o )) 

E[C(0 o ) - <D d (0 o )] 

MD 

0 

0 

0 

E[<D(0o)] 

LT 

AM 

0 

^var[A^(6»o)] 

E[<D(0o)] 

RS 

A b (0 o ) 

h tLi cb ( 0 o) 

^var[A 6 (6» 0 )] 

0 

SMD 


!£f=i<D s (0o) 

|var[A s (0 o )] 

0 

SLT 

A-smd(Oo) + A^ t ( 0 o ) 

1 Ef=i c*(«o) 

var[A S MD(0o)] + var[A L T(#o)] 

0 


The MD is the only estimator that is optimization based and does not involve simulations. 
Hence it does not depend on b or s and has no simulation noise. The SMD does not depend on 
b because the optimization problem is solved only once. The LT simulates from the asymptotic 
binding function. Hence its errors are associated with parameters of the asymptotic distribution. 

The MD and LT have a bias due to asymptotic approximation of the binding function. In such 
cases, 


Cabrera and Fernholz 


(1999) suggests to adjust an initial estimate 9 such that if the new 
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estimate 9 were the true value of 9, the mean of the original estimator equals the observed value 
9. Their target estimator is the 9 such that Ep e [$] = 9. While the bootstrap directly estimates 
the bias, a target estimator corrects for the bias implicitly. Cabrera and Hu (2001) show that the 
bootstrap estimator corresponds to the first step of a target estimator. The latter improves upon 
the bootstrap estimator by providing more iterations. 

An auxiliary statistic based target estimator is the 9 that solves E-p s [ , 0(y(0))] = ip(y{8o)). It 
replaces the asymptotic binding function lim 7'_ >00 E[^(y(0o))] by [^(y(0))] and approximates 
the expectation under Vg by stochastic expansions. The SMD and SLT can be seen as target 
estimators that approximate the expectation by simulations. Thus, they improve upon the MD 
estimator even when the binding function is tractable and is especially appealing when it is not. 
However, the improvement in the SLT is partially offset by having to approximate the mode by the 


mean. 


6 Two Examples 

The preceding section can be summarized as follows. A posterior mean computed through auxiliary 
statistics generically has a component due to the prior, and a component due to the approximation 
of the mode by the mean. The binding function is better approximated by simulations than 
asymptotic analysis. It is possible for simulation estimation to perform better than i/)md even if 
ip{9) were analytically and computationally tractable. 

In this section, we first illustrate the above findings using a simple analytical example. We then 
evaluate the properties of the estimators using the dynamic panel model with fixed effects. 

6.1 An Analytical Example 

We consider the simple DGP y* ~ N(m,a 2 ). The parameters of the model are 9 = (m,a 2 )'. We 
focus on (T 2 since the estimators have more interesting properties. 

The MLE of 8 is 

™ = z 2 = ^J2(y t -y) 2 . 

t=1 t=1 

While the posterior distribution is dominated by the likelihood in large samples, the effect of 
the prior is not negligible in small samples. We therefore begin with a analysis of the effect of the 
prior on the posterior mean and mode in Bayesian analysis. Details of the calculations are provided 
in appendix D.l. 
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We consider the prior 7 r(m,a 2 ) = (a 2 ) a I o . 2 >0 , a > 0 so that the log posterior distribution is 


—T 

log p(6\y) = logp(9\fh,a 2 ) oc — 


1 


T 

- 


logger ) - aloga - 

a t =1 


m 


a 2 > 0 - 


The posterior mode and mean of a 2 are <r^ orfe = 2a and <7^ ean = T + 2 a -5 • respectively. Using 
the fact that Up 2 ] = ^ a 2 , we can evaluate o~ 2 node - j (J 2 nean and their expected values for different 
a. Two features are of note. For a given prior (here indexed by a), the mean does not coincide with 


Table 1: Mean 9bc vs. Mode 6bc 


a 

Obc 

Obc 

Epsc] 

E[0bc] 

0 

° T- 5 

a 2 

° T—5 


1 

G 2 ^— 

u T _3 

d' 2 ^- 

° T+ 2 

<7 2 |a| 

o- 2 TM 

° T+2 

2 

_T_ 

° T—1 

a 2 ^- 
u T+ 4 

C7 2 

o- 2 TM 
o T+4 

3 

a 2 ^- 
u T+1 

a 2 ^- 
u T+ 6 

0-2 ZAI 

o T+1 

° T+6 


the mode. Second, the statistic (be it mean or mode) varies with a. The Jeffrey’s prior corresponds 
to a = 1, but the bias-reducing prior is a = 2. In the Appendix, we show that the bias reducing 
prior for this model is ir R (9) oc 

Next, we consider estimators based on auxiliary statistics: 


V’(y ) 7 = 




As these are sufficient statistics, we can also consider (exact) likelihood-based Bayesian inference. 
For SMD estimation, we let ( rhs,&s ) = (g )T] s=1 m s , s The TT quasi-likelihood using 

the variance of preliminary estimates of m and a 2 as weights is: 


exp(— J(m, a 2 )) = exp — — 


m — m 


a- 


+ 


(a 2 -a 2 ) 2 
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The LT posterior distribution is p(m , cr 2 |m, a 2 ) oc 7T (m, a 2 ) exp(—J(m, <r 2 )). Integrating out m gives 
p((j 2 \fh,d 2 ). We consider a flat prior tt u (9) oc I^xj and the bias-reducing prior -k r (9) oc 1/o- 4 I o .2> 0 . 
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The RS is the same as the SMD under a bias-reducing prior. Thus, 


:2 


SMD 

iEiiiiiW-e*) 2 

-2 ,R 
a RS ~ 

(j 2 

J_ spT ( P b _ pb \2 

BT A()=l At=ll e f t > 

~2,U 
a RS ~ 

B 

V Et=! {e\-e b YITf 

/ -/ B 1 

b = 1 ^ b ' 1=1 HLM'-e b ') 2 /T 


For completeness, the parametric Bootstrap bias corrected estimator bootstrap = 2<t 2 — IERootstrap^ 2 ) 
is also considered: 

^9 ^oT-1 _ 9/ 1 

^Bootstrap = 2d - (7“—^— = a (l + -). 

^Bootstrap(ir 2 ) computes the expected value of the estimator replacing the true value a 2 with <t 2 , 
the plug-in estimate. In this example the bias can be computed analytically since E(<t 2 (1 + ^)) = 
cr 2 (l — y)(l + y) = cr 2 ( 1 — 7^-). While the bootstrap does not involve inverting the binding function, 
this computational simplicity comes at the cost of adding a higher order bias term (in 1/T 2 ). 

Figure 1: ABC vs. RS Posterior Density 


0 . 6 - 




A main finding of this paper is that the reverse sampler can replicate draws from p\bc$ o)> 
which in turn equals the Bayesian posterior distribution if -0 are sufficient statistics. The weight for 
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each SMD estimate is the prior times the Jacobian. To illustrate the importance of the Jacobian 
transformation, the top panel of Figure [I] plots the Bayesian/ABC posterior distribution and the 
one obtained from the reverse sampler. They are indistinguishable. The bottom panel shows an 
incorrectly constructed reverse sampler that does not apply the Jacobian transformation. Notably, 
the two distributions are not the same. 


Table 2: Properties of the Estimators 



i ki(S,T) = ( ’ S ^ T l^_i 2 ) 2 (sff-i)- 4 ) 2 ' ) > b Kl tends to one as B, S tend to infinity. 

ii k lt = c^M(-c L t), c 2 lt = f, k lt ->0asT->oo. 

iii kslt = klt ■ S T ■ InvxIfT-p, A ,slt = 2a 4 vsa-(KsLT) + 4a 4 ^^cov(kslt, S • TlnvXs(T-i)))- 


The properties of the estimators are summarized in Table [2j It should be reminded that 
increasing S improves the approximation of the binding function in SMD estimation while increasing 
B improves the approximation to the target distribution in Bayesian type estimation. For fixed T, 
only the Bayesian estimator with the bias reducing prior is unbiased. The SMD and RS (with bias 
reducing prior) have the same bias and mean-squared error in agreement with the analysis in the 
previous section. These two estimators have smaller errors than the RS estimator with a uniform 
prior. The SLT posterior mean differs from that of the SMD by kslt that is not mean-zero. This 
term, which is a function of the Mills-ratio, arises as a consequence of the fact that the cr 2 in SLT 
are drawn from the normal distribution and then truncated to ensure positivity. 
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6.2 The Dynamic Panel Model with Fixed Effects 


The dynamic panel model yu = on + pyu-i + vea is known to suffer from severe bias when T is 
small because the unobserved heterogeneity a* is imprecisely estimated. Various analytical bias 
corrections have been suggested to improve the precision of the least squares dummy variable 
(LSDV) estimator f3. Instrumental variable estimators have also been considered. Hsiao (2003) 


provides a detailed account on the treatment of this incidental parameter problem. Gourieroux 


et al. (2010) suggests to exploit the bias reduction properties of the indirect inference estimator 


using the dynamic panel model as auxiliary equation. That is, 'tp(O) = 0. The authors reported 
estimates of /3 that are sharply more accurate in simulation experiments that hold a 2 fixed. The 
results continue to be impressive when an exogenous regressor and a linear trend is added to the 
model. We reconsider their exercise but also estimate a 2 . 

With 9 = (/?, (3, cr 2 )', we simulate data from the model: 


Hit. — Oii + pyu-i + /3Xit + (T£it■ 


Let A = It — liT^/T A = A® It, y = A vec(y),y = A vec(y-i),x = A vec(x), where y_i are 
the lagged y, we use the following moment conditions: 


g(p,/3,v 2 ) 


y_i(2/ — PM_1 - /^) 

x{y - py__ x ~ (3x) 

- py_ x - fix) 2 - cr 2 (i - l/T)j 


with g(p,(3,a 2 ) = 0. The MD estimator is thus also the LSDV. The quantity gg{0) for SMD and 
g b (9) for ABC are defined analogously. For this model, Bayesian inference is possible since the 
likelihood in de-meaned data 

1 / i N 

L(y,x|6>) = N exp ? YV - py - (3x i ) l il 1 (y - py - f3x t ) 

y/2TT\(T 2 n\ V ^ ’ ’ 

where D = It-i — 1t~i3' t _ 1 /T. For LT, ABC-MCMC, SMD the weighting matrix is computed 

as: W = (jfjt Xyj 1 9it9it — d'd) -1 - Recall that while the weighting matrix is irrelevant to finding 

the mode in exactly identified models, W affects computation of the posterior mean. The prior 

is 7 t(9) = I 0 - 2 >o iPe [_i j i]. For SMD, the innovations e s used to construct ip s are drawn from the 

standard normal distribution once and held fixed. 


Table [3] report results from 5000 replications for T = 6 time periods and N = 100 cross-section 
units, as in Gourieroux et al. (2010). Both p and a 2 are significantly biased. The mean estimate 
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1 2.5 - 


Figure 2: Frequentist, Bayesian, and Approximate Bayesian Inference for p 



0^3 CK4 0.5 0.6 o!r 

Posterior for rho 


variable RS SLT | BC SMD 

PBc(p\i>) is the likelihood based Bayesian posterior distribution, 

Pslt(p\*P) is the Simulated Laplace type quasi-posterior distribution. 

Prs(pW is the approximate posterior distribution based on the RS . 

The frequentist distribution of Osmd is estimated by J\T(9sMD,vSr(9sMD))- 

of the long run multiplier (not reported) is only 1.6 when the true value is 2.5. The LT is the 
same as the MD except that it is computed using Bayesian tools. Hence its properties are similar 
to the MD. The simulation estimators have much improved properties. The properties of 9rs are 
similar to those of the SMD. Figure [2] illustrates for one simulated dataset how the posteriors for 
RS /SLT are shifted compared to the one based on the direct likelihood. 
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Table 3: Dynamic Panel /? = 0.6, /3 = 1, cr 2 = 2 


Mean over 1000 replications 




MLE 

LT 

SLT 

SMD 

ABC 

RS 

Bootstrap 


Mean 

0.419 

0.419 

0.593 

0.598 

0.588 

0.599 

0.419 

p- 

SD 

0.037 

0.037 

0.036 

0.035 

0.036 

0.035 

0.074 


Bias 

-0.181 

-0.181 

-0.007 

-0.002 

-0.012 

-0.001 

-0.181 


Mean 

0.940 

0.940 

0.997 

1.000 

0.995 

1.000 

0.940 

P- 

SD 

0.070 

0.071 

0.073 

0.073 

0.073 

0.073 

0.139 


Bias 

-0.060 

-0.060 

-0.003 

0.000 

-0.005 

0.000 

-0.060 


Mean 

1.869 

1.878 

1.973 

1.989 

2.055 

2.099 

1.869 

a 2 : 

SD 

0.133 

0.146 

0.144 

0.144 

0.150 

0.152 

0.267 


Bias 

-0.131 

-0.122 

-0.027 

-0.011 

0.055 

0.099 

-0.131 


S 

- 

- 

500 

500 

1 

1 

- 


B 

- 

500 

500 

- 

500 

500 

500 


Note: MLE=MD. The ABC is MCMC-ABC with <5abc = 0.025, keeping every 500-th draw. 
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7 Conclusion 


Different disciplines have developed different estimators to overcome the limitations posed by an 
intractable likelihood. These estimators share many similarities: they rely on auxiliary statistics 
and use simulations to approximate quantities that have no closed form expression. We suggests 
an optimization framework that helps understand the estimators from the perspective of classical 
minimum distance estimation. All estimators are first-order equivalent as S —> oo and T —> oo for 
any choice of ir(9). Nonetheless, up to order 1/T, the estimators are distinguished by biases due 
to the prior and approximation of the mode by the mean, the very two features that distinguish 
Bayesian and frequentist estimation. 

We have only considered regular problems when 6 o is in the interior of 0 and the objective 
function is differentiable. When these conditions fail, the posterior is no longer asymptotically 
normal around the MLE with variance equal to the inverse of the Fisher Information Matrix. 
Understanding the properties of these estimators under non-standard conditions is the subject for 
future research. 
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Appendix 


The terms A(9) and (D(0) in 9m d are derived for the just identified case as follows. Recall that ij) has a 
second order expansion: 

^ = ^(9 0 ) + + ® +0p (I). (A.l) 


Vt t 


K T' 


Now 9 = 0q+ A< ^) + + o p (ij;). Thus expanding ip(9) around 9 = 9 0 : 

= V'(^o) + fpe{9 0 ) ^ ^ H + o p (—^ V’e.e,- {9 0 )A(9 Q )Aj(9 0 ) + o p (—). 

Equating with ip(9 0 ) + A + o p {and solving for A, C we get: 

A(0 O )= [^(0o)] _1 A(0 o ) 

C(0 O )= [MOo)]- 1 U(9 0 ) - \j2^, ej (9 0 )A(9o)A J (9 0 )) . 


j =i 


For estimator specific and a^, define = trace(['0 e (0 o )] 1 EE V’e.e,-(^o)^ (0 O ) + A^ e (0 o )]) 


Cf(0 o ) — 2 —^-yA d (0 o )ad(^o)^o — ad(9o) 2 9o — 
^[9oJ 


j =1 ' 




r(^o) 2 


A^A/A AA 


-^X>dA)-^A))4A)- 


(A.2) 


6=1 


Where ad = jj J2b=i a d > A is defined analogously. Note that a A) —> 0 as B —» oo if -0(0) = 0 and the first 
two terms drop out. 


A.l Proof of Proposition [l], RS 

To prove Proposition 111 we need an expansion for ip b (9 b ) and the weights using 


9 b 


= 9q + 


A b (So) 

VT 


C b (9g) 

T 


+ °p(T ) ’ 


(A.3) 
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i. Expansion of iff (do ) and r e m 


ne b ) = w b ) + 


A b (0 b ) € b (8 b ) , 1 

-VT r + ^^ + °* ( f> 


. A\e„) C‘(« 1 A‘(9„ + + 2>1 + <*(*)) 

= m + ^jr+ ^^+o P ( f »+-^- 

€*(»„+ ^ + £>2+„„(!)) , 

+- Y - -f -+ o p (~) 


= Wo) + 


A b (9 0 ) , M0o)A b (8 o ) , <C b (9 0 ) , A b (8 0 )A b (8 0 ) 


Vt Vt 
l«i’o,e j (0 o )A b (0 o )A b (e o ) 

oE — 


T 


T 


3 =i 


T 


, 1, 

+ °p( ji)- 


Since i p b (9 b ) equals %p for all b, 

A\6 0 ) = k(0o)l _1 (A(0 o )^A b (0 o )) 


T-l 




j=l 


it follows that 


$(* 6 ) = ^( 0 o + ^^ + ^EM+o p (^) 


, , „ , A b (0 o) , C b (0 o ) , , 1,, 

- ipg[8 0 + H-^-t-Op(y)) + 


a^(« 0 + ^fi + ^ + <*(*>) 




+ 




T 


, 1 . 

+ °p(y) 


, , A ^,e 3 (8o)A b (e 0 ) A b (8 0 ) 1 * * ^a(*o)AS(0oK( 6> o ) 

= ^°) + 2.- v¥ -+ ^t^ + 2 2^- t - 


VT 


3 =1 v v j=l fe=l 

^^ e , ej (8 0 )C b (8 0 ) _ ^ A%.(8 0 )A b (e 0 ) i C b (0 o ) , _ ,1, 

/ > rp / j rp rj~i °P\r r )’ 


\ rj~\ / 


(A.4) 


C b (0o) = <M0o) C(0o)-(C b (0o)-xE^A( 0 o)A b (0o)d b (0o)-A b (0o)d b (0o) , (A.5) 


i=i j=i 

To obtain the determinant of ipg(8 b ), let a b (8 0 ) = trace(.4 b (#o)), o^o) = trace(*4 b (0o) 2 )> c b (0 o ) = trace(C b (#o))> 
where 

/ K 


A b (8 0 ) = \M0o)} ( J2^V^)A b (8 0 ) + A b (8 0 )j 

^e, S] ,eMA b (8o)A b k (8o) K 


\3 — ^ 

K K 


C b (8 0 ) = \M0 0) 'uEE 

\ Z j= 1 fc=i 




£ + e<. j( 9oK(«+€*(»„) 


i=i 


./-1 


Now for any matrix X with all eigenvalues smaller than 1 we have: log (Ik + X) = X — \X' 1 + o(X). 
Furthermore, for any matrix M the determinant \M\ = exp(trace(logM))). Together, these imply that for 
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arbitrary Xi, X 2 : 


T X l X 2 ,1, 

+ Vt + -t + °^t ] 


= exp 

= 1 + 


( (Xl X 2 X? , 1 

( trace ( —+ -=r 


+ °p(tf) 


y/T T ' T ' “^T 
trace (Xi) trace (X 2 ) trace (Xi) 


Vt 


T 


T 


+ °p( yO- 


Hence the required determinant is 

A\0 o ) t C b {9 0 ) 


4(O b ) = ^ 0 ) 


/ + 


Vt 


+ 


T 


+ °p(^) = 


1 + 


a b (0 0 ) 

Vt 


4 Vo) , V(0 O ) 


T 


T 


, 1 , 

+ °p( ji) 


ii. Expansion of w b (9 b ) = \ipe(9 b )\ 1 7r(9 b ): 


-1 


WO tt(V) = ^(0o) 


V’eVo) 


-1 


-1 


1 + 

1 - 


« b Vo) Q 2 Vo) 

Vt t 
a b Vo) a b (0o) c b Vo) 


c b Vo) ,1 


rp +°p(p) 


Vt 


T 


j, + °p(y) 






T 


(jO) 


x ( 7r(^o) + TTflVo) J_ O ^+7r e (0 o ) T 


C b {6 0 ) 1 V- ^ VoM & Vo)A b Vo) 


K 

2 X 

J=1 


T 


+ °Vy) 


a b Vo) , ^VoM b Vo) a b Vo) c b Vo) 

^ o) " “TF + TT(^) VT --T-F“ 

^Vo) a b (0 o M b Vo) ^Vo) C b Vo) , 1 A b (e o )ir e ,e'(0o)A b '(0o) 


+ °p(^) ■ 


tt(6» 0 ) T tt(6» 0 ) T ' 2 T ' ^ V T' 


Now .A Vo) = jj J2b=iA b (0o). Similarly define CVo) = jjC b Vo)- Also, denote the term in 1/T by: 

7t(6» 0 ) 7t(6» 0 ) 

The normalized weight for draw b is: 


e b (0 o ) = -4(9 0 ) - c b (0 o ) - ^°^a b (9 0 )A b (9 0 ) + ^°^C b (9 0 ) + ^A b VoW Vo)A b 'Vo). 


wTibmb 


c n = 



^ b (0 b ) 

”V(V) 

Ef=i 

^(6» c ) 

7r(0 c ) 


1 


_ n^Ap) 1 TTe(9o) A b (9 0 ) , e b (0 o ) . 

VT + TrCBnl xTr " T ' b>p\r) 


tt(9o) Vt 


/1 _ a b (0o) , Tra(gp) X(9 0 ) 1 e b (go) , / J_\ 

1 ( 1 v/T ~ l " TTffn) ,/T + T W b>p\rp) 


B 


7T(9o) VT 


1 - 


PKtj 

_ s ( s o) n fi'l 

Vt 1 7* (On) Vt t ^°p\t) 


a(9o) , (8p) A(8 0 ) . e(6p) 


1 

B' 


a b (9 0 ) , irg(0 o ) A b (0 o ) , e b Vo) 


- 1 Vr /+ V^o) 


Vo) , e“ y 0 j , „/, , a(0 o , 

vr + —+»p(f)j x (i + vf 


aVo) Tre(^o) A(0 O ) e(9 0 ) 1 

tt( 0 o ) Vt T + ° pl T 


= iV _ a b (0o)-a{0 o ) , ^Vo) A b Vo) - A Vo) , e b (0 o )-e(9o) a b Vo)a(0o) 7r e Vo) A b (0 o )a(0 o ) 


tv Vt 

7r e (0 o ) A(flo)a b (fl 0 ) 

t(9 0 ) 


T 


t{9 0 ) 

Te(9 0 )Tr e (9 0 y 


Vt 

A b (9 0 )'A(9 0 ) 

T 


T 


n(9o) 


T 


+ °p(p))- 


t(9 0 ) 2 

The posterior mean is 9rs = Y2b=i w b (9 b )9 b . Using 9 b defined in (A.31, A and C defined in (A.4) and (A.51: 

B A b (0 o ) 


9 ns = d 0 + ^J 2 

b= 1 v 


1 sr^C b (9 0 ) TTg(9 0 ) 1 (A b (0 o ) - A(d 0 ))A b (6» 0 ) r M (f) \ , V 

rL —+ ^ 5 !-T- + C ( 0 o) + o p (-). 


B ' T 

6=1 


6=1 


T 


T' 
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B.l Proof of Results for LT 


b _ A\e o) C\9o ) 1 

0 -^0+ /= + ^ +Op(^J, 


we have, given that 0b is drawn from the asymptotic distribution of 0 


r(0 b ) = 0(0 




,/ A b (0 o ) C fc (0o) ,1A . A^(0 o + ^ + ^ol+ Op (i)) 

= v 'r + ^ 7 r + ^ + 0 p( T ) J + - VT - 

A^(0 O ) 0 e (0 o )A 6 (0 o ) A^(0o)A b (0 o ) 1 A 0 0A 00 o )A b (0 o )A b (0o) 1 

= V’IC'o) + - J=- + -Tip- T - + ,L - r - + ° P W> 


which is equal to 0 for all b. Hence 


A\e o) = [^(0o)] 1 (A(0o)-A^ o (0o)) 


C^o) = 


<c(0 o ) - - J2^A d o)A b {e 0 )A b J (e 0 ) - Al^e 0 )A b (e 0 ) 


Note that the bias term C b depends on the bias term (D. For the weights, we need to consider 


aw - *(«. + ^ + ^ + Mi)) + ^ 


^0) , C b (e o) . /_i 

A Afi “T t ' U P\T 


, (a ^ ^ 0^(0o)A b (0 o ) ^ A^(0 O ) * 0 e , 9j (0o)A)(0 o ) ^ A &1,e,e^Vo) 

= ^W + 2.-^-+ ^5^ + N- t - + 0-r- 

j=l j=l j=l 

1 A 1’eft j ,e h mA b j (0o)A b k {0 o ) , .1. 

+ 2 £ ---y- + o P {ji). 


A b (9 0 ) = 0, 


A£(0o) + £0^o)A b (0 o ) 

f=i 


C b (0o) = [0 9 (0o)] £>,<>, (0o)C£(0 o ) + Y, Ab oo,e, 0 ,( 00 )A b (0o) + ^E£ ^e, Sj , ek (0o)A b (9 o )A b k (9 o ) 

\i=i i =1 i =1 fc=i 

a b (0 o ) = trace(A b (0o))) a \{^o) — trace(A b (6 ) 0 ) 2 ), c b (0 o ) = trace(C b (0 o ))- 


The determinant is 


-i r , A b (0 o ) , C b (9 0 ) , ^ 1 ^ - 1 

-< d —I-^-b°p(^;) — we\%) 

- 1 ^ a b (gp) c&go) c 6 (fl 0 ) , ^ 1\ 

V Vt t t A t )J ■ 


1 , a b (9 0 ) , a*(0 o ) , c & (0o) , ,1, 

1 + ^r + — + — +0 'A ] 
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The prior is 


n(d b ) = 7T 0 O + 


A b (8 o) 

VT 


C b (0 o ) ,1. 

rjn °P ( rp ) 


= 7 t(9q) + 7Tg(8 0 ) 


A b (e 0 ) 

Vt 


+ 7Te(0 O ) 


c b (9 0 ) 

T 


1 A b (9 0 )7r e , g ,A b '(8 0 ) 1 

2 - T - + ° p{ T ] ■ 


Let: e b (6o) = — c b (0q) — a\ (#o) + C b (Oq) + A b (Oq) (0q) A b '( 0 q ). After some simplification, the product 


is 


$g(9 0 ) n(8 b ) = i/)e{9 0 ) 7r(0 o )( 


1 a b (8 0 ) , ^(0 o )^ b (0o) , e b (9 0 ) , ,1 

VT 7t( 0 o ) Vt T + ° p[ T 


Hence, the normalized weight for draw b is 




\(pb) 1 a b (0o) n g (0 o )A b (0 o ) e b (6 0 ) , /In 

1 ' 1 1 n/t 1 7T(0 O ) n/T 1 T +0 P\T) 

Z^c= 1 

fy(8o) 

1 /J3 .N. B i a( s o) , fffl(So) ^4(So) , e(0 o ) , / 1 \ 

ir{O c ) 1 + 7T(0 O ) VT + T +Op{ T ) 


1 (, a b (9 0 ) , 7r e (0 o ) dl fc (d 0 ) , e b (d 0 ) , , 1 A , a(0 O ) ^o) A(0 O ) 

Vt n(8 0 ) Vt t + °p [ t ) ){ + yr n(0 o ) VT 

1/ a b (9o) — a(9o) ng(9 0 ) A b {9 0 ) -A(0 O ) e b (8 0 )-e(8 0 ) a b (8 0 )a(9 0 ) 

HV Vt tt(^o) Vt t t 

Tre(^o) a b (6»o)A(6> 0 ) , 7re(0 o ) a(0 o )A b (0 o ) , V V 
+ 7t(8o) T + n(8 0 ) T +^ T >)- 



Hence the posterior mean is $lt = J2b=i w b {9 b )9 b and 9 b = ^ 9 0 + + ° ^ + o p (^)^. After simplifi¬ 

cation, we have 


a a i A(9 0 ) C(9 0 ) 1 (a b (9 0 ) ~ a(9 0 ))A b (9 0 ) [ n (g 0 ) ^(^o)] 9 0 1 7Tg(0o) v'' b (9 0 ) — A(9o))A b (9 0 ) 

9 LT = 8 0 + — + — --^ -T- T - + bV{9o)^ - 


Vt 


T 


i>=1 


b -1 


T 


«( 6 > o) 2 6 >q ; n n e (d 0 ) a(9 0 )A(9 0 )9 0 H 
T + 7t( 6» 0 ) T + ° pl T j 


= PQ 


A(d 0 ) , C(6 0 ) , 7t 9 (6» 0 ) 1 ^ (A b (0 o ) - A(0 o ))A & (0o) , n , / V 

VvvhL -v- + C ( 0 o) + Op (^), 

6=1 


Vt 


T 


T 


where all terms are based on A b (9o) defined in (B.l) and C b (9o) in (B.2). 


C.l Results for SLT: 


From 


r(9) = m + ^+op(i) 

e b = e 0 + ^ + cb M + o4 ) , 
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we have 


2 . (cb) 1 vW, . A\e 0 ) c\e o) i \ a^o + ^ + ^ + 0p (^)) 

^ V 0 ^ ( rV + -Vr- 

// fl \ , 1 A S (0 O ) A^(0 O ) , ^ 6 (^o) 1 y^ Ag(0 o )A fc (0 o ) A^ g (0 o )A b (9 o ) 

= Wo) + y 2^ —7^ + —77^ + + y E- r -+- t - 


Sj=[ VT VT ^ u ' VT _ s=1 

* A b (e 0 )A b 1 (e 0 ) . , x c*(0 o ) 


1 V- (D s ($o) 1 v - ^ in \ 

S E T + 2 E ( e °) 

«=1 3 = 1 


T 


+ ^e(0 o )- 


T 


l 1 ^ 

+ °p{ j)- 


Thus, 


A\e 0 ) = \MOo)] (A(0 o )-i^A s (0 o )-A^(0 o ) 


S =1 

S 


I< 


C\e 0 ) = v^o)^ <DW-^C 8 (6 »o)-^^( 0 o)^o)4-(^o) 


S =1 


i=i 




ipe(& 0 ) 


-^A?(0 o ) + A^(0o) 


S = 1 


A b (0 o ). 


(C.l) 


(C.2) 


Note that we have A^ ~ A/” while A s A- J\f. To compute the weight for draw b , consider 


rm = M 9 0 + 


A b (g 0 ) C b (9 0 ) 

VT t 


V)Vz 


^A-(9. + 7fi + ^+o P (i)) 


s=l 


VT 


^{oo + ^ + ^+oV^ 1 ^v s (oo + 4 5r i + c ^ + op(?)) 


Vt 1 5 ^ t +°p( T ) 


K 


s =1 

A^o) , 1 ^A|(0o) , A L, e (^o) lAc'W , E 


7 = 1 s =1 s=l 7=1 


ipe,eA^o)- 


T 


s k as 


A^.(0 o )A b (0 o ) , ^AV >Mj (0o)A b (0 o ) ,A|(0 o )A^o) , ,1, 

E E —- + E- ~t -+ A E E ^,»(A( J o)-^-+ °p(t)- 


5 ' T ^ T '2 

s=l .3=1 3=1 3=1 fe=l 


k T' 


Let: 


A b (0 o )= tW) 1 + 




l- 1 / 1 


S = 1 

S 


3 = 1 


A' 


5 
- 1 1 1 


s=1 


i=i 


C b (e 0 ) = V^o) [ 7E CS (W + E ^, Sj (^)C b (e 0 ) + - £ Al e .(0o)A b (9o) + A^ !M .( 0 o)A b (^o) 


V*(fo)l ( 7 E ^^(flo)4(floM5(flo)l 


j,/c=l 


a b (#o) = trace(*4. b (0o)) 5 a^flo) = trace(Vr (# 0 ) E c b (#o) = trace(C & (#o))- 
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The determinant is 


rm 


-l 


i>e(Oo) 


-1 


1 - 


a b (e o) a b 2 (0 o ) c b (0 o ) 




T 


T + °p{j) ) • 


Hence 


7f>tab\~ 1 ^iab\ ^ i a A°o) a b 2 {0 0 ) c b (9 0 ) , , 1^ 

V (0 ) ^(9 )= i> 9 {9 o) tt(6» 0 ) II - —j= - — - — + Op(-) 

M , 7T d (9 0 )A b (9 0 ) n e (0 o )C b (0 0 ) 1 ^ 7t Mj ( 0 o ) A b (0 o )A b (0 o ) 1 

x I + + 9 t +0 p^ 


tt(0o) Vt tt(^o) T 


2 ^ 7r(6»o) 


T 


, ^ “E,* ^ A a b (0o) , n e (6o)A b (9o) , e b (0 o ) , _ , 1, 

We\0a) tt(6»o) ( 1-+ 3751-7^r H-— + o p {-) 


where e b (0 o ) = -a b (9 0 )^A b (9 0 ) - a b (0 o ) - c b (0 o ) + ^C b (0 o ) + \ £f=i 
normalized weights are 


Vr tt(0o) y/T T ' ~ VK T' 

l ^K *<>.°j( e °) A b(g 0 ) A b,g 0 ) The 


" 6 (0 b ) = 


■0 b (6> b ) 1 7r(6» h ) 


Ec=l V ,C (^ C ) 7r(^ C ) 


-1 


B 


1 - 


'(00) , 7T e(0 o )A b (0 o ) , e b (0o ) , 1 


V 7 ? 7r(0o) Vt 


+ °p(Y 1 + 


o(0q) 7Te(0o) A(0 O ) e (0o) 


r 'TjrVr tt( 0 o ) Vt t ^ pV r- 


+ °p(^) 


The posterior mean $slt = J2b= l w b (0 b )0 b with 9 b = 9 q + + ° + 0 p(t)‘ After some simplification, 


9slt — ^0 ' 

+2 


A(9 0 ) , C(0„) , ^(0o) 1 ^ (A h (0 o ) - A(9 0 ))A b (9 0 ) 1 A (a b (0 o ) - a(0 o ))A b (8 o ) 


+ 


E 


Vt T tt( 0 q ) B T 

ne(9o) a(9o)A(9o)9o a 2 (0o)0o r 7r0 (0°)^r/a \i2 t7 o 


B 

--Y 

R 


6=1 


T 


7T(0o) 


T 


T 




, A(0 O ) , C(0 O ) , 7re(0o) 1 ^ (A b (0 o ) - A(0 o ))A b (0 o ) , , , ,1, 

= 0 n + ^ + _ + _-^-^-+C (0o) + o P (^) 


Vt 


T 


where terms in A and C are defined from (C.l) and (C.2). 
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D.l Results For The Example in Section 6.1 

The data generating process is yt = mo + aoet, &t ~ Hd A/"(0,1). As a matter of notation, a hat is used to 
denote the mode, a bar denotes the mean, superscript s denotes a specific draw and a subscript S to denote 
average over S draws. For example, es = ^ Ef=i Et=i e t — § Es=i § s . 

MLE: Define e = ^ Et=i e t- Then the mean estimator is to = mo + ooe ~ iV(0, <Tq/T). For the variance 
estimator, e = y — m = ao(e — e) = aoMe, M = It — 1(1'1) _1 1' is an idempotent matrix with T — 1 degrees 
of freedom. Hence a 2 ML = e'e/T ~ OqXt-i- 


BC: Expressed in terms of sufficient statistics (m, cr 2 ), the joint density of y is 

'T /— .•^^2 nn~2 


p(r,m,e*) = ( 2 ^ )T/2 


E+-i ( TO — m) 2 —Ter 2 
exp I - f ~ i , „ -— x 


2a 2 


2a 2 


The flat prior is 7r(m, cr 2 ) oc 1. The marginal posterior distribution for cr 2 is p(cr 2 \y) = p(y|m, a 2 )dm. 
Using the result that exp(— ( TO _ fh) 2 )dm = V^na 2 , we have 

p(a 2 \y) oc (2na 2 y {T ^ 1)/2 exp(-Ta 2 /2a 2 ) ^ 

The mean of an invr(a,/3) is Err- Hence the BC posterior is a 2 BC = E(a 2 |y) = tr 2 ^^. 


SMD: The estimator equates the auxiliary statistics computed from the sample with the average of the 
statistics over simulations. Given a, the mean estimator mg solves fh = fhg + cr ], Ef=i Since we use 

sufficient statistics, m is the ML estimator. Thus, fhs ~ A/"(m, + ^). Since yf — y s t = <x(e£ — e s ), the 

variance estimator ct| is the cr 2 that solves <x 2 = cr 2 (-^p Ef=i Etli( e t — e s ) 2 ) Hence 


= 


l 

ST 


E,E f (e?-e s )' 


= (J 


X 2 t-i/T 

X | (t _ 1) /(5T) 


— <7 2 .Ft_1,S(T_1)- 


The mean of a F^^ random variable is d ^_ 2 . Hence E(ag MD ) = a 2 S (t~i )-2 


LT: The LT is defined as 


which implies 


Plt(ct~ \d 2 ) oc 1 ct 2 > 0 exp - — 


T (a 2 - a 2 


2 2a 4 


2 ct 


o |(j 2 ~lt M \ a 2 , —— truncated to [0, +oo[. 


T 


For X ~ A f(/i, a 2 ) we have E(XjX > a) = p + (E tE a (Mills-Ratio). Hence: 

^ V a / 

Eg. 2 ) 

V A/Tct 2 ' 

i 0-g 2 
'v^Tts 2 ' 


ip / 2|C:2\ C;2 , ^ y/2/Ta 2 ^ -2 /i 4>{ \ 


Let klt = \/ j; - . We have Elt(ct 2 |(? 2 ) = cr 2 (1 + klt) • The expectation of the estimator is 

E (E LT (a 2 |a 2 )) = a 2 ^ (1 + «lt) 
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from which we deduce the bias of the estimator 


E (E lt (ct 2 |ct 2 )) - a 2 = a 2 lt - ^ 


The variance of the estimator is (1 + klt) and the Mean-Squared Error (MSE) 

which is the squared bias of MLE plus terms that involve the Mills-Ratio (due to the truncation). 
SLT: The SLT is defined as 


Pslt(ct 2 |ct 2 ) oc l ff 2 > 0 exp - 


where 


(£2 2 Xg(T-i) 

T \ a ST 


= 1 ct 2 >0 exp - 


V 2 I a 2 / Xg (' r - 1 ) _ rp 

^ l [ xs g r ~ 1) ] 2 \ ST a 


2 T 2 

~2 _ Jl\_ ST jL _ eS\2 _ „2 Xg(T-l) 

OS Q / j rj~\ / j \^t ) Cj-i 

S = 1 £=1 


This yields the slightly more complicated formula 




Xs(t-i) 2a 4 ST 


ST ’ T Yo 


^5(T-1) 


and the posterior mean becomes 


a ST/xs(t- i) 


T V 2 

X S(T—1) 


®*lt(*V) = ^ 7 

XS(T -" ! t ( 

1 

\ V X S(T-1) 


Xs(T-l) 


Xs(T-l) 1 — $ (~y/T/2j %S(T- 1) 


Let «slt = y 7 2/T 


i_$(_yT 72 ) v 7 x| ( ^_i) Li xl 


(random). We can compute 


and the bias 


E (E SLT (a 2 |a 2 )) = 2 + - 2 V E(KS lt) 

E (E SLT (cr 2 |CT 2 )) - cr 2 = 0- 2 5( , T _ 2 ^ _ 2 + O’ 2 —7 ^ L E(k SLT ) 


which is the bias of SMD and the Mills-Ratio term that comes from taking the mean of the truncated normal 
rather than the mode. The variance is similar to the LT and the SMD 


^cr 4 m ——_- + 2ct 4 V(k S lt) + 4cr 4 —^ Cov(k S lT) -5 -)• 

1 1 1 “ X-S(T-1) 
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The extra term is due to kslt being random. We could simplify further noting that kslt = Klt : 


ST 


'•S(T-l) 


E(ksLt) = «LT g(T _i)_ 2 , V(KSLT) = «LT (5(T-l)-2)^(5(T-l)-4) and Cov(k SLT , = K hT S 2 TV(l/x 2 s(T _ 1) ) 


s 2 t 

KLT (S(T-!)-2) 2 (S(T-l)-4) ' 


The MSE is 
2 


a 


T - 1 


S(T — 1) — 2 T 
= 2cr 4 


E(«slt) 

2 


+ 2 a 4 m—+ 2ct 4 ¥(k S lt) + 4<t 4 ^—^Cov(/c S lt, ~y~ -) 

11 1 X-S{T- 1) 


+ Kl - 


1 


[S(T — 1) — 2 ] 2 T — 1 


iT-lf 

J 1 2 


E(«g LX + 


4cr 4 


T- 1 


S(T-l)-2 T 


E(kslt) 


MSE of SMD 


+2ct 4 ¥(/cslt) + 4<t 4 ——Cov(k S ltj —t ——)■ 


rp2 


X-S(T- 1) 


RS: The auxiliary statistic for each draw of simulated data is matched to the sample auxiliary statistic. 
Thus, fh = m b + a b e b . Thus conditional on m and cr 2 ’ b , m b = m — a b e b ~ Af(0,a 2,b /T). For the variance, 
d 2 ,b = a 2 , b j 2 t ( e b _ e b f/T. Hence 


E 2 E t (e^e) 2 /T 

E t {e b -e b ) 2 /T ° EM-e b ) 2 /T 


invT 


T-l Ta 2 \ 
2 ’~2~) 


Note that pbc{t 2 \a 2 ) ~ invT 


T-3 Tif 
2 ’ 2 


under a flat prior, the Jacobian adjusts to the posterior to match 


the true posterior. To compute the posterior mean, we need to compute the Jacobian of the transformation: 


M" 1 = 


Since a 2 ’ b = 




_ T.M-e b ) 2 ' ^ EHef-W 

Under the prior p(a 2,s ) oc 1, the posterior mean without the Jacobian transformation is 


-2 2 1 
a = a — > 


E t (et-e) 2 /T 


Bt[T, t (e b t -e b ) 2 /T 


B—> oo ^2 
—->• <T 


T -3 


The posterior mean after adjusting for the Jacobian transformation is 


7=2 

a RS 


Eb=i ° 2 ' b 


T,M~e b ) 2 


Ef=llA 


■ 2,6 


^2 = Td 2]sE bAT 

E6=iE t (4-e 6 ) 2 /r iE^ b 


where l/A = Et( e t b - e 6 ) 2 . As R ->■ oo, A EE 6 ) 2 ^-^ 6 ) 2 ] and 5 E b z bJL >E[z b ). Now z b ~ invyEt 
with mean Eg and variance giving -E[(z fe ) 2 ] = (t- 3 )(t- 5 ) • Hence as H —> oo, ct 2 rs,r = 7 E 5 = 

7=2 

a BC- 


Derivation of the Bias Reducing Prior The bias of the MLE estimator has E(tr) = er 2 — ^cr 2 and 
variance V(a 2 ) = 2a 4 — E). Since the auxiliary parameters coincide with the parameters of interest, 
Wgtp{6) and ’Vggiij}(0) = 0. For Z ~ Af(0, 1), H(i>; cr 2 ) = v / 2ct 2 (1 — ^)Z, Thus d a 2 A(v; cr 2 ) = \/2(l— ^)Z, a s = 

'This holds because a 2 ’ b (a 2,b ) = o 2 so that |do :2 ’ (l /d<T 2 ’ f ’| _1 = |rfcr 2,i> / c/ct 2 |. 


38 











































v / 2cr 2 (l — if)(Z — Z s ). The terms in the asymptotic expansion are therefore 

d a *A[v'^)a' = 2a 2 (l-^fZ s (Z-Z s )^Hd^A(v s -a 2 )a s ) = -a 2 2(l-^) 2 
V(a s ) = 4a 4 (l-l) 2 
cov(a s ,a s ') = 2(1-^) 2 cr 4 


(! ~ + ^~g^cov{a s ,a s 


) = 


-( 1 -^) 2 ( 4 ( 1 -^) + 2 ^) 


3(5-1) 


Noting that \d$ 2 (j 2 ' b \ oc er 2,b , it is analytically simpler in this example to solve for the weights directly, ie. 
w(a 2 ) = 7r((T 2 )|i9$2(T 2 ’ b | rather than the bias reducing prior n itself. Thus the bias reducing prior satisfies 


d a 2 w(a 2 ) = 


- 2 a 2 (l-A ) 2 


‘(1 - ^) 2 ( 4(1 - j ) + 2 ^) ff2 4 ( 1 - |) + 2 ^ s i 

Taking the integral on both sides we get: 

log(w(cr 2 )) OC — log(cr 2 ) => w(a 2 ) oc => 7r(cr 2 ) OC 

a 1 er 4 

which is the Jeffreys prior if there is no re-weighting and the square of the Jeffreys prior when we use the 
Jacobian to re-weight. Since the estimator for the mean was unbiased, 7r(m) oc 1 is the prior for m. 

The posterior mean under the Bias Reducing Prior 7r(<7 2,s ) = 1 /<t 4,s is the same as the posterior without 
weights but using the Jeffreys prior 7 t(ct 2 ’ s ) = 1/(7 2,s : 


—2 _ Ea=l 0- 2 ’ s (1/c 7 2 ’ s ) 

a RS — 


S 


E ; = i(e t -e) 2 /T 


e; =1 i/ 


2 ,s 


e: =1 ia 


•2,s 


EtiEL (e?-e s ) 2 /(5T) 


— 

= a SMD • 
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D.2 Further Results for Dynamic Panel Model with Fixed Effects 


Table 4: Dynamic Panel p = 0.9, f3 = 1, a 2 = 2 
Mean over 1000 replications 




MLE 

LT 

SLT 

SMD 

ABC 

RS 

Bootstrap 


Mean 

0.751 

0.751 

0.895 

0.898 

0.889 

0.899 

0.751 

p■ 

SD 

0.030 

0.030 

0.026 

0.025 

0.025 

0.025 

0.059 


Bias 

-0.149 

-0.149 

-0.005 

-0.002 

-0.011 

-0.001 

-0.149 


Mean 

0.934 

0.934 

0.998 

1.000 

0.996 

1.000 

0.935 

P- 

SD 

0.070 

0.071 

0.074 

0.073 

0.073 

0.073 

0.139 


Bias 

-0.066 

-0.066 

-0.002 

0.000 

-0.004 

0.000 

-0.065 


Mean 

1.857 

1.865 

1.972 

1.989 

2.054 

2.097 

1.858 

a 2 : 

SD 

0.135 

0.141 

0.145 

0.145 

0.151 

0.153 

0.269 


Bias 

-0.143 

-0.135 

-0.028 

-0.011 

0.054 

0.097 

-0.142 

S 


- 

- 

500 

500 

1 

1 

500 

B 


- 

500 

500 

- 

500 

500 

- 


See note to Table 3. 
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